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environments, and embodiments in keeping with the scope 
and spirit of the invention. Some embodiments described 
may include, inter alia, systems, integrated circuit chips, 
methods, and computer-readable media containing instruc- 
tions. 

A system is described that includes rate monitors that 
measure the rate at which traffic arrives for each of the 
outputs of the system and includes a set of rate-controlled 
virtual output queues at each input line card. In one embodi- 
ment, there is one queue for each output of the system. Some 
embodiments further include a rate assignment mechanism 
that determines what rates should be assigned to each of the 
rate-controlled virtual output queues. These rate-controlled 
virtual output queues also include a mechanism for adjusting 
the rates at which packets are sent to the outputs of the 15 
system. These rate-controlled virtual output queues may 
include the mechanism for automatically determining and 
adjusting their sending rates, or receive this information 
from another source (e.g., another component, external 
source, etc.). In one embodiment, these sending rates are 20 
adjusted based on received flow control information. 

The system receives flow control information correspond- 
ing to the status of each of the outputs of the system. In one 
embodiment, the system includes an interconnection net- 
work that maintains separate internal buffers for each of the 
different output links and sends XON and XOFF flow 
control signals to the input ports as necessary to regulate the 
flow of packets to different outputs. The ability to control 
input rates within a system is not limited to any particular 
flow control scheme. Numerous mechanisms are known in 
the art for accumulating and distributing flow control infor- 
mation in systems, including those for use in packet switch- 
ing and other communications systems, and therefore, are 
not discussed with particularity herein. 

In one embodiment, a rate monitor M(ij) for traffic from 
input i to output j includes a state machine S(i j) with three 
states: unconstrained, off and backlogged. If output j is not 
congested (e.g., the total traffic going to output j does not 
exceed the bandwidth of the interface to the output line card) 
then S(ij) is unconstrained. S(i,j) goes to the off state 
whenever the input line card at input i receives a flow control 
signal turning off traffic to output j. S(i j) goes from the off 
state to the backlogged state whenever it receives a flow 
control signal turning on traffic to output j. S(ij) goes from 



25 



30 



35 



40 



approach allows the pacing rate to more quickly adapt to 
changes in the rate at which traffic arrives. In other embodi- 
ments, the pacing rate is determined with additional param- 
eters. For example, in systems which support packets of 
varying lengths, the pacing rate may be based on the size of 
the received packets (e.g., total bytes, etc.), rather than, or in 
addition to a count of packets. 

Different embodiments employ various acceleration fac- 
tors f, which may substantially vary between different sys- 
tems. Acceleration factor f may be set at system configura- 
tion time or may be varied during the operation of the system 
based on some parameters, such as traffic congestion. In one 
embodiment, acceleration factor f is related to the speed-up 
factor of the packet switching fabric over the packet arrival 
rate. For example, in one embodiment system having a 
speed-up factor of 1.3, an acceleration factor f of approxi- 
mately 1.2 is used. 

In one embodiment, each input i has a queue for each 
output and a queue scheduler that determines when packets 
are sent from each queue. At any point in time, a queue at 
input i for a backlogged output j is assigned a rate P(i,j) and 
the queue scheduler sends packets to output j at the assigned 
rate, whenever S(i,j)=backlogged (when S(i j)=off, no pack- 
ets are sent from input i to output j). 

Let T(ij)=l/P(ij) be the target time interval between 
successive packets sent from input i to output j. T(ij) is 
expressed in units equal to the time it takes an input line card 
to send a packet to the interconnection network. 

In one embodiment, the queue scheduler is a data struc- 
ture that comprises a set of "timing wheels." A timing wheel 
can be implemented as a one-dimensional array of linked 
lists. Each list contains a set of queue identifiers. The 
position of a list in the array is used to determine when the 
queue so identified should next send a packet to the output 
link. In the simplest case, a single timing wheel is used. In 
such an embodiment, indicators of outputs are stored in the 
timing wheel data structure until their scheduled time. At 
this time, the indicators are removed from the timing wheel 
data structure and placed in a transmit list. Items are 
removed from the transmit list and a packet corresponding 
to the output is sent, with an indicator for the output 
re-inserted into the timing wheel data structure in an appro- 
priate time bin if packets remain to be sent to the output. 
The time bin into which a queue identifier is inserted, is 



the backlogged state to the unconstrained state when the 45 selected to produce the desired rate of transmission from that 

queue at input i for output j becomes empty. queue. For each queue, there is a parameter T(i j) referred to 

In one embodiment, when S(i j) is unconstrained (e.g., the as the inter-packet time for that queue. This parameter gives 

output is not congested), packets are sent to output j at their the average number of packet times between successive cell 

arrival rate. When S(i,j) is off (e.g., the output is in a off transmissions from the queue. To enable accurate rate speci- 

state), packets are not sent to output j. When S(i,j) is 50 fications, the inter-packet time may be expressed in time 



backlogged (e.g., the output is in a backlogged state), 
packets are sent to output j at a reduced pacing rate approxi- 
mately proportional to their arrival rate. 

In one embodiment, the rate at which traffic arrives for 
congested outputs is monitored. Qpe method of doing this is 
to keep a record of the last timpto^hen the queue at input 
i for output j was empty and to alum the number of packets, 
c, received since time t 0 . A measured average arrival rale, 
R(ij), at time t is then equal to c/(t-t 0 ). The pacing rate is 
then set according to the formula, pacing rate=f*R(ij), 
where f is a parameter of the system and is called the 
acceleration factor. An alternative to measuring the average 
arrival time from the last time the queue was empty is to 
measure the average arrival time during successive mea- 
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units that are smaller than the time it takes to transmit a 
single packet. When a queue identifier is re-inserted into a 
time bin, a target transmission time is computed for the next 
packet to be sent from that queue. This target transmission 
time is equal to T(i j) plus the target transmission time of the 
previous packet sent from the queue. The queue identifier is 
re-inserted into that time bin whose contents will be trans- 
ferred to the transmit list at the time that is closest to the 
target transmission time. 

In one embodiment, each timing wheel also has a cursor 
which points to one of the lists in the array. The cursors are 
advanced from one position in the array to the next position 
in the array as time advances. The cursor for the first timing 
wheel is advanced at every time step (a time step being the 
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surement intervals while the queue remains non-empty. This 65 time it takes an input line card to send a packet to the 
can be done, for examnkjsy clearing c periodically and at interconnection network). The cursor for the second timing 
the same time setting to ftqual to the current time. This wheel is advanced less frequently, the cursor for the third 
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wheel 601 is maintained with the current time indicated by 
cursor 602. A transmit list 604 is also maintained to indicate 
outputs which are allowed to be sent a packet, and in which 
order. In the illustrated embodiment, timing wheel 601 and 
transmit list 604 both use linked list data structures and 
include output queue identifier elements 603 A and 603B 
(which may be in the form of output queue identifier data 
structure 500 illustrated in FIG. 5A). 

At the current time indicated by cursor 602, output queue 
identifier elements 605 are moved from timing wheel 601 to 
the tail of transmit list 604. In parallel, the output queue 
identifier element 606 at the head of transmit list 604 is 
removed and a corresponding packet, stored in a packet 
queue (not shown) is sent to the corresponding output. If the 
output is in the "BACKLOGGED" state, the output queue 
identifier element 606 is rescheduled and placed in timing 
wheel 601 at an appropriate place corresponding to a next 
time to send the next packet to the corresponding output. In 
one embodiment, this next time is proportional to the 
measured and maintained average packet arrival rate for the 
output as previously discussed herein. 

One embodiment for maintaining the state of an output in 
response to received flow control information is illustrated 
in the flow diagram of FIG. 7 A. Processing begins at process 
block 700 and proceeds to process block 705, where flow 
control information is received for an output. Next, as 
determined in process block 710, if the output's current state 
is "UNCONSTRAINED," then if an XOFF flow control 
signal is received as determined in process block 712, then 
the output's state is set to "OFF" in process block 714, and 
the packet count for the output is reset in process block 716. 

Otherwise, as determined in process block 720, if the 
output's current state is "OFF," then if an XON flow control 
signal is received as determined in process block 722, then 
if the output's output queue is empty as determined in 
process block 730, then the output's state is set to "UNCON- 
STRAINED" in process block 732. Otherwise, the output's 
state is set to "BACKLOGGED" in process block 734, and 
an output queue identifier corresponding to the output is 
placed in the transmit list in process block 736. 

Otherwise, the output is in the "BACKLOGGED" state, 
and as determined in process block 742, if an XOFF flow 
control signal is received, then the output's state is set to 
"OFF" in process block 744. 

Processing then returns to process block 705 to receive 
more flow control information. 

The operation of one embodiment in response to a 
received packet is illustrated in FIG. 7B. Processing begins 
at process block 755, and proceeds to process block 760 
where a packet destined for a particular output is received. 
Next, in process block 765, the received packet is placed in 
an output queue corresponding to the output destination of 
the received packet. Next, as determined in process block 
770, if the current state of the output is "UNCON- 
STRAINED," then an output queue identifier is placed in the 
transmit list in process block 772. 

Otherwise, the packet count is increased for the output in 
process block 775r*Then, as determined in process block 
780, if the output's current state is "BACKLOGGED," then 
if the output is not already scheduled in the transmit list as 
determined in process block 790, then an output queue 
identifier is placed at the end of the transmit list in process 
block 795. 

Processing then returns to process block 760 to receive 
more packets. 

The operation of an embodiment for processing the trans- 
mit list is illustrated in FIG. 8. Processing begins at process 
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block 800, and proceeds to process block 805, where an 
output queue identifier is removed from the head of the 
transmit list. Next, in process block 810, a packet is retrieved 
from the head of the indicated output queue and sent to the 
output. Next, as determined in process block 815, if the 
output's state is "BACKLOGGED", then, if the output 
queue corresponding to the output just sent a packet is empty 
as determined in process block 820, then the output's state 
is set to "UNCONSTRAINED" in process block 825. Oth- 
erwise, the output queue identifier is rescheduled in process 
block 830. 

Processing then returns to process block 805 to send more 
packets. 

For simplicity of understanding, some embodiments have 
been described herein using one type of data structures 
and/or elements. Typically, these data structures and ele- 
ments have been described in the form of a linked list. As is 
apparent to one skilled in the art, numerous other embodi- 
ments are possible which use one or more of a wide variety 
of data structures and elements in keeping with the scope 
and spirit of the invention. 

In the foregoing specification, the invention has been 
described with reference to specific exemplary embodiments 
thereof. It will, however, be evident that various modifica- 
tions and changes maybe made thereto without departing 
from the broader spirit and scope of the invention as set forth 
in the appended claims. The specification and drawings are, 
accordingly, to be regarded in an illustrative rather than a 
restrictive sense. 

What is claimed is: 

1. An apparatus comprising: 

a plurality of rate monitors to measure the rate at which 

traffic arrives for each of a plurality of outputs of a 

packet switching system; 
one or more state data structures indicating a state of each 

of the plurality of outputs of the packet switching 

system; and 

a rate-controlled virtual output queue for each of the 
plurality of outputs of the packet switching system, 
each of the rate controlled virtual output queues adjust- 
ing a rate at which packets are sent to a particular 
destination based at least in part on the measured traffic 
arrival rate for the particular destination and the state 
for the particular destination; 

wherein the one or more state data structures maintains an 
indication of one of at least three different states for 
each of the plurality of outputs of the packet switching 
system; and 

wherein packets are not sent to a particular output when 
the particular output is in a first state, packets are sent 
to the particular output at approximately the measured 
traffic arrival rate when the particular output is in a 
second state, and packets are sent to the particular 
output at a reduced rate approximately proportional to 
the measured traffic arrival rate when the particular 
output is in a third state. 

2. An input line card comprising the apparatus of claim 1. 

3. The apparatus of claim 1, wherein each of the rate- 
controlled virtual output queues includes a transmit list. 

4. The apparatus of claim 1, wherein each rate-controlled 
virtual output queue includes a timing mechanism. 

5. The apparatus of claim 1, wherein each of the plurality 
of rate monitors include one or more data structures main- 
taining an indication of a packet count and a reference time 
period. /ThM 

6. The apparatus of clairxTeJwherein the timing mecha 
nism includes one or more tuning wheels. 
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7. The apparatus of clai 
virtual output queue compi 
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herein the rate-controlled 
at least one scheduling data 
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structure, said at least one scheduling data structure includ 
ing scheduling information with a timing granularity greater 
thap \\\%\ of fhe timing mechanism. ] 

8. The apparatus of claftrf 6fVherein the one or more state 
data structures maintains alrfndication of one of at least 
three different states for each of the plurality of outputs of 
the packet switching system. 

9. The apparatus of claim 7, wherein said scheduling 
information includes a target time for sending a next packet. 

10. A method performed by a packet switching system, 
the method comprising: 

receiving packets at a first component of the packet 
switching system, at least a subset of the received 
packets being destined for a second component of the 
packet switching system; 

maintaining a state data structure indicating a state of the 
second component; 

maintaining a rate data structure reflective of an arrival 
rate at which packets destined for the second compo- 
nent are received at the first component; 

sending received packets to the second component at a 
first rate approximately proportional to the arrival rate 
when the state data structure indicates the second 
component is in a first state; and 

sending received .packets to the second component at a 
second rate less than the first rate and greater than zero, 
and approximately proportional to the arrival rate when 
the state data structure indicates the second component 
is in a second state. 

11. The method of claim 10, wherein the first rate is 
approximately the arrival rate of the received packets. 

12. The method of claim 10, wherein the rate data 
structure includes a count of a subset of the received packets. 

13. The method of claim 10, wherein a set of possible 
states for the state of the second component includes an 
unconstrained state- r an off state, and a backlogged state. 

14. The method of claim 13, further comprising sending 
no received packets to the second component from the first 
component when the state data structure indicates the second 
component is in an off state. 

15. A method performed by a packet switching system, 
the method comprising: 

receiving a plurality of packets, each of the received 
plurality of packets being destined for one or more of 
a plurality of outputs of the packet switching system; 

measuring a traffic arrival rate for each one of the plurality 
of outputs of the packet switching system, the traffic 
arrival rate reflective of the rate at which traffic arrives 
for a corresponding one of the plurality of outputs of 
the packet switching system; 

maintaining an indication of a state of said each one of the 
plurality of outputs of the packet switching system; 

sending received packets to a particular one of the plu- 
rality of outputs at a first rate approximately propor- 
tional to the measured traffic arrival rate for the par- 
ticular one of the plurality of outputs when the 
maintained state indication reflects the particular one of 
the plurality oT outputs is in a first state; and 

sending received packets to the particular one of the 
plurality of outputs at a second rate less than the first 
rate and greater than zero, and approximately propor- 
tional to the measured traffic arrival rate for the par- 
ticular one of the plurality of outputs when the main- 
tained state indication reflects the particular one of the 
plurality of outputs is in a second state. 



16. The method of claim 15, wherein no packets are sent 
to a particular one of the plurality of outputs when the 
maintained state indication reflects the particular one of the 
plurality of outputs is in a third state. 
5 17. The method of claim 15, wherein said indications of 
said states of the plurality of outputs are updated based on 
received flow control information. 

18. The method of claim 15, wherein said method is 
performed by an input line card of the packet switching 

io system. 

19. The method of claim 15, wherein measuring the traffic 
arrival rate includes maintaining a packet count and a time 
reference. 

20. The method of claim 15, further comprising: 

15 maintaining a packet queue for each output of the packet 
switching system; and 
placing each packet of the plurality of received packets in 
one of the plurality of packet queues based on a 
destination of said each packet. 
20 21. The method of claim 20, further comprising placing an 
indicator of a corresponding one of the plurality of packet 
queues in a transmit list upon arrival of a particular received 
packet having a destination of a selected one of the plurality 
of outputs being in the first state. 
25 22. The method of claim 15, wherein sending received 
packets to the particular one of the plurality of outputs at the 
second rate includes: 

sending one of the plurality of packets to the particular 
one of the plurality of outputs of the packet switching 
30 system; and 

rescheduling the particular one of the plurality of outputs 
of the packet switching system in a timing data struc- 
ture for a second scheduled time based upon the 
measured traffic arrival rate for the selected output. 
35 23. The method of claim 22, wherein sending received 
packets to the particular one of the plurality of outputs at the 
second rate includes retrieving a transmit indication corre- 
sponding to the particular one of the plurality of outputs of 
the packet switching system from the timing data structure 
40 at a first scheduled time. 

24. The method of claim 22, wherein the second sched- 
uled time reflects an actual time to send one of the plurality 
of packets to the selected output of the packet switching 
system rather than a time relative to a last sent packet to the 

45 selected output of the packet switching system. 

25. The method of claim 22, wherein the timing data 
structure includes one or more timing wheels. 

26. The method of claim 22, comprising maintaining a 
target time for the sending one of the plurality of packets, 

50 wherein the second scheduled time is approximately the 
target time. 

27. The method of claim 26, wherein the target time has 
a finer timing resolution than that of the timing data struc- 
ture. 

55 28. The method of claim 15, wherein sending received 
packets to the particular one of the plurality of outputs at the 
second rate includes: 

retrieving a transmit indication corresponding to a 
selected output of the plurality of outputs of the packet 
60 switching system from a timing data structure at a first 
scheduled time and placing the retrieved transmit indi- 
cation in a transmit list; 
removing the retrieved transmit indication from the trans- 
mit list and sending one of the plurality of packets to 
65 the corresponding selected output of the plurality of 
outputs of the packet switching system based on the 
retrieved transmit indication; and * niM% 
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those for use in packet switching and other communications systems, and therefore, are 
not discussed with particularity herein. 

In one embodiment, a rate monitor M(i j) for traffic from input i to output j 
includes a state machine S(i j) with three states: unconstrained, off and backlogged. If 
output] is not congested (e.g., the total traffic going to output j does not exceed the 
bandwidth of the interface to the output line card) then S(i j) is unconstrained. S(i,j) goes 
to the off state whenever the input line card at input i receives a flow control signal 
turning off traffic to output j. S(ij) goes from the off state to the backlogged state 
whenever it receives a flow control signal turning on traffic to output j. S(ij) goes from 
the backlogged state to the unconstrained state when the queue at input i for output j 
becomes empty. 

In one embodiment, when S(i,j) is unconstrained (e.g., the output is not 
congested), packets are sent to output j at their arrival rate. When S(i,j) is off (e.g., the 
output is in a off state), packets are not sent to output j, When S(i j) is backlogged (e.g., 
the output is in a backlogged state), packets are sent to output j at a reduced pacing rate 
approximately proportional to their arrival rate. 

In one embodiment, the rate at which traffic arrives for congested outputs is 
monitored. One method of doing this is to keep a record of the last tin ^t^ hen the queue 
at input i for output j was empty and to count the number of packets, c, received since 
time to. A measured average arrival rate, R(i j), at time t is then equal to c/(t-to). The 
pacing rate is then set according to the formula, pacing rate=f*R(i j), where f is a 
parameter of the system and is called the acceleration factor. An alternative to measuring 
the average arrival time from the last time the queue was empty is to measure the average 
arrival time during successive measurement intervals while the queue remains non-empty. 
This can be done, for example, by clearing c periodically and at the same time settir^^j 
equal to the current time. This approach allows the pacing rate to more quickly adapt to 
changes in the rate at which traffic arrives. In other embodiments, the pacing rate is 
determined with additional parameters. For example, in systems which support packets 




In re TURNER ET AL., Application No. 09/705,450 v 
Amendment A 

Amendments to the Claims: 

The listing of clams will replace all prior versions, and listings, of claims in the 
application: 

Listing of Claims: 

Claims 1 (canceled) 

Claim 2 (currently amended): An input line card comprising the apparatus of claim 1 - 



-Gteifiii (currently amended): The apparatus of claim 1 claim^C wherein the one or 
more state data structures maintains an indication of one of at least three different states for 
each of the plurality of outputs of the packet switching system. 
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In re TURNER ET AL., Application No. 09/705,450 
Amendment A 

Claim 4 (currently amended): The apparatus of claim 3, An apparatus comprising: 

a plurality of rate monitors to measure the rate at which traffic arrives for each of a 
plurality of outputs of a packet switching system; 

one or more state data structures indicating a state of each of the plurality of outputs of 
the packet switching system; and 

a rate-controlled virtual output queue for each of the plurality of outputs of the packet 
switching system, each of the rate controlled virtual output queues adjusting a rate at which 
packets are sent to a particular destination based at least in part on a measured traffic arrival 
rate and a state for the particular destination; 

wherein the one or more state data structures maintains an indication of one of at least 
three different states for each of the plurality of outputs of the packet switching system; and 

wherein packets are not sent to a particular output when the particular output is in a 
first state, packets are sent to the particular output at approximately the measured traffic 
arrival rate when the particular output is in a second state, and packets are sent to the 
particular output at a reduced rate approximately proportional to the measured traffic arrival 
rate when the particular output is in a third state. 

Claim S^currently amended): The apparatus of claim 1 claim 4, wherein each of the 
rate-controlled virtual output queues includes a transmit list. 



Ctemrtf (currently amended): The apparatus of claim 1 claim 4 , wherein each 
rate-controlled virtual output queue includes a timing mechanism. 




Application/Control Number 09/705,450 Page 2 

Art Unit: 2666 

EXAMINER'S AMENDMENT 

1 . An examiner' s amendment to the record appears below. Should the changes and/or 
additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 
1 .3 12. To ensure consideration of such an amendment, it MUST be submitted no later than the 
payment of the issue fee. 

Authorization for this examiner's amendment was given in a telephone interview with 
Kirk D. Williams on 6/9/2005. 

The application has been amended as follows: 
In the claims: 

In lines 8-9 of claim 4, "a measured traffic arrival rate and a state for the particular 
destination" has been corrected to —the measured traffic arrival rate for the particular destination 
and the state for the particular destination-. A _ ^ 

Replace claim 7 with --ClpkffT: The apparatus of clahj^ffwherein the timing 
mechanism includes one or more timing wheels.— * ^ ^W*"^ 7 , 

Replace claim 8 with -Ckjimlh The apparatus of clmjar^wherein the rate-controlled 
virtual output queue comprises at least one scheduling data structure, said at least one scheduling 
data structure including scheduling information with a timing granularity greater than that of the 
timing mechanism. -- 

2. The following is an examiner's statement of reasons for allowance: 

To further clarify the reasons for allowance, the prior art of record fails to disclose, in 
combination with the other limitations of the independent claims, the combination of the 




